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Abstract. The author has previously extended the theory of regular and 
irregular primes to the setting of arbitrary totally real number fields. It has 
been conjectured that the Bernoulli numbers, or alternatively the values of 
the Riemann zeta function at odd negative integers, are evenly distributed 
modulo p for every p. This is the basis of a well-known heuristic, given by 
Siegel in [17], for estimating the frequency of irregular primes. So far, analyses 
have shown that if Q(\/i5) is a real quadratic field, then the values of the 
zeta function C-d(1 — 2m) = CQ(y^)(l ~ 2m) at negative odd integers are also 
distributed as expected modulo p for any p. We use this heuristic to predict the 
computational time required to find quadratic analogues of irregular primes 
with a given order of magnitude. We also discuss alternative ways of collecting 
large amounts of data to test the heuristic. 



1. Introduction 

Let Q{^/D) be a real quadratic field with D a positive fundamental discriminant. 
In several previous papers the author has defined an analogue for the theory of 
regular and irregular primes in this setting, based on the following definition: 

Definition 1. Let Cd be the zeta function for Q(\Ad), and let 6 be equal to p — 1 
unless D — p, in which case 5 = (p — l)/2. We say that p is D-regular if p is 
relatively prime to C-d(1 ~ 2m) for all integers m such that 2 < 2m < 6 — 2 and 
also p is relatively prime to pCoi^ — S). The number of such zeta- values that are 
divisible by p will be the index of D -irregularity of p. 

(More generally, we may refer to the concept as "quadratic irregularity" ; see [12, 
13, 14] for more details and extensions to any totally real number field.) 

According to a well-known theorem of Kummer, p divides the order of the class 
group of Q(Cp) if and only if p divides the numerator of a Bernoulli number i?2m 
for some even 2m such that 2 < 2m < p — 3. Such primes are called irregular; the 
others are called regular. In [13], building on work of Greenberg and Kudo, the 
author proved that in the setting we have described above Kummer's criterion can 
be extended to give information about whether p divides the class number (that is, 
the order of the class group) of Q(\/Z3, (p). To be exact, we have: 

Theorem 1 (Greenberg, Holden). Assume that p does not divide D. Then p di- 
vides the class number o/ Q('\/D, Cp) */ O'l^d only if p is not D-regular. 
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The main focus of this paper is in finding large p which are irregular for some 
D. This may be useful for cryptography, in that one common way of constructing 
public-key cryptographic systems is to utilize the problem of finding a discrete 
logarithm in some abelian group. In order to make sure that the discrete logarithm 
problem is computationally hard, one needs to know something about the structure 
of the group involved, e.g. that it is divisible by a large prime. Theorem 1 shows 
that if ^) is a large D-irregular prime and p does not divide D, then the class group 
of Q{VD,Cp) may be suitable for cryptography. We will come back to this in 
Section 5. 



Suppose, for instance, that we want to find p of a specified size dividing Cu (1 ^ 2m) 
for some m such that 2 < 2m < d — 2 or dividing pCd{^ — S). (For reasons that 
will become clear, we will not encounter the situation D = pin practice, so we may 
focus on the case where S = p — 1. See Section 5 for the details.) More specifically, 
we might fix a real number c greater than 1 and then look for m and D such that 
P < p < cP and p divides — 2m) for some positive m less than or equal to 
(p — l)/2. (In practice, c = 2 would probably be the most common choice, since 
that would be equivalent to specifying the size of p in number of bits.) 

The algorithm that we will use to carry out this search is described in [14]. The 
algorithms there fall into two basic types. The first type calculates Cd{^ — 2m) in 
a range of m for each D before going on to the next D, and calculates each value 
in time 0{m^^^^ D^~^°^^^) when amortized over both D and m. The second type 
calculates C-d(1 ^ 2m) in a range of D for each m before going on to the next m. 
If one keeps a table of intermediate values as described in Section 3 of [14], this 
algorithm can calculate each value in time 0(m'^^^^ L{D)^^^^) when amortized over 
both D and m, where L{x) is a subexponential function corresponding to a choice 
of factoring routine used in the calculation, e.g. L{x) = e'=('°s^)^^^(i°gi°s^)^^^ for 
the number field sieve. The facts that the amortized times are subpolynomial in D 
and that a range of D are calculated for each m suggest using this algorithm. 

In fact, one might suppose that one could always use m = 1 and search until an 
appropriate D is found without ever going on to the next m. However, one other 
factor needs to be taken into account. The size of the numerator of C-d(1 ^ 2m,) 
is 0(m(lgm + IgDj) bits (see [12]), so Cr>(l ~ 2m) is much more likely to have 
large prime factors for large m than for small m. The same issue comes up for D, 
of course, but to a lesser degree and in a way which does not greatly affect this 
algorithm, since a large number of values of D are used for each m. 

To make sure that we can avoid getting stuck in a range where the values of the 
numerator of Cd{^ — 2m) arc; too small, wc will give our algorithm parameters Mi, 
£>i, and D2 such that we always have m > Mi and Dx < D < D^- In Section 3 we 
will explain some conjectures which imply that for each pair {D, m), the chance that 
each prime between P and cP divides C-d(1 ^ 2m) is approximately 2/ P. Given 
this, the Prime Number Theorem implies that the chance that some prime between 
P and cP divides Cd(1 ~ 2to) is approximately 2(c — l)/(logP). 

Now if wc are trying Di < D < D2 for each m,, wc see that the chance that we 
find a suitable D for any given m is approximately 
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Table 1. Time in minutes to find the first suitable pair {D,m) 
with given parameters 



c 


P 


D2 


{D,m) 


P 


min. 


1.01 


lO'' 




(4156,2) 


100391 


27 


1.1 


lO'-^ 


10* 


(697,2) 


106681 


33 


1.6 


10'^ 


10* 


(205,2) 


113173 


81 


2 


10'"^ 


10* 


(184,2) 


164999 


82 


2 


1G« 


300 


(40,3) 


1264807 


169 


2 


iqs 


10^ 


(380, 2) 


1017299 


191 


2 


iqs 


10" 


(380, 2) 


1017299 


191 


2 


2-10^ 


lO'^ 


(317,2) 


2027569 


569 



since the asymptotic density of fundamental discriminants in the integers is 3/7r^. 
However, if we consider the typical case c = 2, then we see that we only have to 
choose D2 — D1 to be on the order of magnitude of log P for the expected probability 
of success on any given m to be 1! Thus with D2 — D1 reasonably large, the expected 
time using this strategy is 




where the added term of P accounts for the time it takes to check whether each p 
divides each computed Cd(1 — 2m). Table 1 provides some actual timing examples 
of this algorithm, running on a Pentium III computer using the Linux operating 
system and the GP-Pari interpreted language. (See [2].) In all cases Z?i = 5 and 
Ml = 2. 

3. The Hypotheses: Conjectures and Previous Results 

The hypotheses mentioned in Section 2 stem from the conjecture, made (not 
very explicitly) by Siegel in [17], that the numerators of the Bernoulli numbers i?2m 
were evenly distributed modulo p for any odd prime p. Siegel used the conjecture to 
derive a conjectural density for the irregular primes. (Lehmer seems to have done 
the same thing in [16] but only gives the density.) Siegel's hypothesis was used more 
generally by Johnson ([15]) and independently by Wooldridge ([20, Chap. HI]) to 
predict the density of primes with a given index of irregularity, that is such that 
p divides a given number of the Bernoulli numbers B2, ■ . ■ , Bp-3. It also comes in 
handy for predicting many other values that are related to irregular primes, such 
as the order of magnitude of the first prime of a given index of irregularity. (See, 
for example, [18].) Since i?2m = — C(l ~ 2m){2m), it is equivalent to say that the 
values of ({1 — 2m) are evenly distributed modulo p, where ({s) is the Riemann 
zeta function. 

Little or no progress has been made on proving Siegel's hypothesis, but a great 
deal of data has been collected, especially in regard to the prediction of Johnson 
and Wooldridge. Specifically, this prediction says that as p ^ 00, the probability 
that p has index of irregularity r goes to 

/iVe-V^ 
[2) r! • 
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(In addition to the original sources, the details may be found in Section 5.3 of [19].) 

Note that this prediction docs not rely on the full strength of Sicgcl's hypothesis, 
but merely on the weaker hypothesis that the Bernoulli numbers are modulo p 
with probability 1 /p. The assumptions made in Section 2 relate only to predictions 
about indices of irregularity based on this weaker hypothesis. 

Wagstaff, in [18], computed Ur{x)^ the fraction of primes not exceeding x with 
index r of irregularity for each r between and 2 and for all r > 3 grouped together, 
and compared this distribution to the predicted distribution for each multiple x of 
1000 up to 125000. The result of the chi-squared test "fluctuated usually between 
0.1 and 1.0 and had the value 0.29 at a; = 125000. It was 0.03 at a; = 8000" 
[18]. These results correspond to significance levels of .992, .801, .962, and .999, 
respectively. (The significance levels used in this paper correspond roughly to the 
probability that the agreement between the observed results and the predicted 
results is not due to chance. Statisticians consider the threshold for considering a 
result to be not due to chance to be a significance level of .9 to .95. Since we are 
not actually conducting a valid statistical study in this paper, all of the statistical 
results should be taken with a very large grain of salt.) 

Buhler, Crandall, Ernvall, and Metsankyla hold the record for computations 
with irregular primes, having found all the irregular primes below four million as 
described in [7]. They do not seem to have done a chi-squared analysis, but they 
tabulate the values of Ur{x) for x = 4000000 and r between and 7. A chi-squared 
test using the same methodology as before has the result 1.02, for a significance level 
of .796. Earlier, in [8], Buhler, Crandall, and Sompolski tabulated the same data 
for X = 1000000. The result of the same chi-squared test is 0.78, for a significance 
level of .854. 

Unfortunately, the only way to collect data to test Sicgcl's hypothesis is to in- 
vestigate i?2m for larger and larger m, which is very computationally intensive. 
(See [1] or [10] for details.) 

However, in the more general number field case, there are many more dimensions 
to the problem. We start by restricting our attention to the case of k an abelian 
totally real number field. Then we know that 

a(s) = J{l{s,x) 

where G is the character group of G = Gal(A;/Q) and L{s,x) is the L-function 
associated with the character %. Note that L{s,l) = C{s), so the Riemann zeta 
function is a factor of the zeta function for k. (See [9], e.g., for more details.) 
Certainly it seems likely that for a fixed (totally real) number field k and character 
X the values of the numerator of L{1 — 2m, x) are evenly distributed modulo p asm 
varies. (It is known that these values arc rational numbers.) We also hypothesize 
that these values for different x are independent, which implies that the numerators 
of Cfc(l — 2m) are distributed modulo p like the product of |G| independent integer 
variables, each of which is evenly distributed modulo p. We will refer to this as 
the "product distribution" , for lack of a better term. However, it also is reasonable 
to conjecture that for a fixed m the values of Cfc(l — 2m) are distributed according 
to the product distribution modulo p as k varies. More precisely, if we fix m, and 
the degree of k we expect the values to be distributed according to the product 
distribution modulo p as the discriminant of k varies. Alternatively, if we fix m 
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and the discriminant of k we expect the values to be distributed according to the 
product distribution modulo p as the degree varies. 

In this paper we will be considering the former situation. As in the previous 
sections, we fix the degree at 2, and let k = Q{\/D) be a real quadratic field with 
zeta function Cd{s)- In this case 

CD{s) = L{s,l)L{s,x) = as)L{s,x) 

where x(.s) = (-j), the Kroncckcr symbol, where appropriate. 
In addition to the above definitions wc will make one more set: 

Definition 2. Let x be as above and let 6 be as in Section 1. We will say that 
p is x-'^eg'Miar if p is relatively prime to L(l — 2m, x) for all integers m such that 

2 < 2™ < 5 — 2 and also p is relatively prime to pL{l — 2171, x)- The number of 
such L- values that arc divisible by p will be the index of x- irregularity of p. 

Saying that the values of C/c(l — 2m) are distributed according to the product 
distribution and that the values of C(l — 2m) are evenly distributed is the same as 
saying that the values of L{1 — 2m, x) arc evenly distributed modulo p. Then we 
can make the same prediction about the indices of x-irregularity that Johnson and 
Wooldridge made about the indices of irregularity in the rational case. We briefly 
investigated this issue in [12], where there arc tables of the analogue of u.r{x) (using 
the index of x-hregularity) for x = 1000, r from to 4, and D = 5,8, 12, and 13. 
The chi-squared test results are not included, but using the methodology discussed 
earlier they are 3.32, 1.74, 1.15, and 2.54. The corresponding significance levels are 
.345, .628, .765, and .469, respectively. We could total the values of (the analogue 
of) Ur{x) for the four values of D and compare them to the predicted values; we 
might expect that this would give us a better significance level because of the 
larger "sample size". However, in this case the chi-squared result is 3.53 and the 
significance level is .316, which is worse than any of the results for the values of 
D taken separately! This may be due to some small second-order bias which is 
common to each sample and thus is reinforced when they are pooled together. 

4. The Hypotheses: New Results 

In the course of testing the algorithms in [14], we collected more data in addition 
to that above. Table 2 shows the number of primes less than 5000 which have x- 
index of irregularity r for various values of r and £> = 5. We compared the observed 
and predicted distributions, using the methodology above, for primes below x where 
X was 1000, 2000, 3000, 4000, and 5000, and found chi-squared values of 3.32, 
5.03, 2.51, 1.73, and 2.10 and significance levels of .344, .170, .473, .630, and .552, 
respectively. 

Table 2. Results for D = 5 and p < 5000 



r 


number 


predicted number 


predicted fraction 





422 


405.16 


.606531 


1 


186 


202.58 


.303265 


2 


51 


50.65 


.075816 


3 


7 


8.44 


.012636 


4 


2 


1.06 


.001580 
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Other data was obtained using the philosophy, described in Section 2, of com- 
puting the values of L(l — 2m, x) for large numbers of D and relatively small values 
of m. As in the discussion of D = 5, 8, 12, and 13 above, we present the total across 
the different discriminants. Table 3 presents the data for all D < 5000 and p < 100. 
The chi-squared value for the totals is 81.1 and the significance level is .000. 

Table 3. Results for D < 5000 and p < 100 



r 


total 


prc>dict(xi total 


predicted 




iiuiiilx^r 


iiuiiil)cr 


Iraction 





21864 


22068.01 


.606531 


1 


11596 


11034.01 


.303265 


2 


2529 


2758.50 


.075816 


3 


347 


459.75 


.012636 


4 


41 


57.47 


.001580 


5 


7 


5.7o 


.000158 



However, if we view the data broken down by prime, as in Table 4, we see 
that a large part of the contribution to the chi-squared value is from small primes. 
The values of p shown in the table were selected with an eye towards showing 
both a trend toward smaller chi-squared values as p increases and also some of the 
exceptions. We hope to make the nature of the small-prime contribution clearer in 
the future. 

Table 4. Results for D < 5000; selected values of p 



r 





1 


2 


> 3 


sig. level 


prcd. 


919.50 


459.75 


114.94 


21.81 




p=3 


876 


640 








.000 


p = 5 


956 


500 


60 





.000 


p = 7 


895 


530 


89 


2 


.000 


p=ll 


876 


497 


131 


12 


.008 


p= 13 


947 


467 


91 


11 


.010 


p = n 


950 


452 


95 


19 


.175 


p = 23 


933 


462 


106 


15 


.387 


p = 37 


913 


468 


108 


27 


.605 


p = 47 


911 


476 


109 


20 


.775 


p = 67 


915 


466 


114 


21 


.986 


p = 79 


859 


487 


144 


26 


.003 


p = %7 


909 


468 


122 


17 


.623 



5. Practical Notes 

The hypotheses that p divides C(l — 2m) with probability 1/p and that p divides 
L[l — 2m, x) with the same probability clearly imply that p divides C,d{^ — 2m) = 
C(l — 2m)L(l — 2m, x) with probability (2/p) — (1/p^), or approximately 2/p for 
very large p, as we claimed in Section 2. Also, for the algorithms in that section 
one doesn't really have to worry about the possibility that 5 is not p — 1, since this 
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would require D = p. However, we showed that D2 — D\ can be on the order of 
magnitude of logp, so the case oi D = p can only arise as the result of what can 
only be called bad planning. 

As mentioned in the introduction, one use for the algorithms of this paper may be 
to find D and p such that the class group of Q(\/D, Cp) can be used for cryptographic 
protocols. In [5], Buchmann and Paulus introduced a one way function based on 
class groups of number fields and noted that such a function could be used to 
implement DifRe-Hellman key exchanges and ElGamal signature schemes, to take 
two examples. These ideas are expanded on in [3], which introduces a signature 
scheme called RDSA which is based on taking p-th roots in the class group of a 
number field or other abelian group. Here p is a random prime number which (one 
assumes) does not divide the order of the group. One advantage of this signature 
scheme is that it is unnecessary and in fact undesirable to know precisely the order 
of the group, a situation which frequently occurs with class groups and in fact is 
generally true for the class groups found with the algorithms above. 

Given that the order of the class group is unknown, the question of which class 
groups arc suitable for these protocols is addressed in [11], which gives two necessary 
conditions on the class number: 

• The class number must be sufficiently large. This should make it difficult to 
determine the class number or discrete logarithms using exhaustive search. 
Pollard Rho, Baby-Step-Giant-Step, Hafner-McCurley, or index calculus. 

• The class number must have at least one sufficiently large prime divisor. This 
should make it difficult to find discrete logarithms using a Pohlig-Hellman 
attack. 

As we have seen, the algorithms of this paper allow us to find a class number with 
a prime divisor as large as desired, and thus with the class number itself as large 
as desired. 

The drawback is that the amount of time and space needed to carry out the 
cryptographic protocols in these groups can also be very large. The papers [5] 
and [11] explain how to represent the objects necessary to compute with. Since 
elements in the class group are equivalence classes of ideals in a ring of integers, 
we need to store a Z-basis for the ring of integers of Q(-\/D, Cp)- As noted in [5] 
and explained in more detail in [6], this requires (log |A|)'^(^^ bits of storage, where 
A is the discriminant of Q(a/D, Cp)- Unfortunately, it is not hard to show that 

p ^ D). (See [19], for instance.) Thus the Z-basis requires (plogZ))'^^^' bits of 

storage. Furthermore, as explained in [11], an ideal class should be represented by 
a member of the class which is LLL-reduced; that is, by one which corresponds 
to an LLL-reduced lattice under Minkowski's embedding. Such an representation 
requires (n + log lAI)*^*^^) bits of storage, where n is the degree of the field and A 
is as before. (See [6] and [4] for details.) In our case n = p — 1 (assuming p ^ D) 
so this again requires {jplogD)^^^^ bits of storage. Of course, this means that the 
time it takes to carry out the basic algorithms for the class group is also generally 
going to be exponential in the size of p. Whether this situation is bad enough to 
preclude the use of our fields is not yet clear. 
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6. Conclusion and Future Work 

Much of the future work described in [14] still remains to be done; in particular 
many improvements could be made in the implementations of the algorithms and 
perhaps in the algorithms themselves. However, the results already seem encour- 
aging. With a faster implementation, the use of the search algorithm of Section 2 
to find class groups large enough for secure cryptography seems quite feasible, al- 
though this should be tested in practice. More importantly, an implementation 
of one or more cryptographic protocols needs to be done using the class groups 
we have described in order to determine whether secure cryptography can be done 
sufficiently quickly in these groups. 

The data collected in Section 4 is also encouraging, but clearly more is necessary. 
The author hopes to implement and run his algorithms on a true supercomputer in 
the near future. The data produced by this will undoubtably give a clearer picture 
of the phenomena so far observed, perhaps leading to refinements of our hypotheses. 
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